Search CORE

51 research outputs found

Consensus Clusters in Robinson-Foulds Reticulation Networks

Author: Eulenstein Oliver
Markin Alexey
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)
Publication date: 01/01/2019
Field of study

Inference of phylogenetic networks - the evolutionary histories of species involving speciation as well as reticulation events - has proved to be an extremely challenging problem even for smaller datasets easily tackled by supertree inference methods. An effective way to boost the scalability of distance-based supertree methods originates from the Pareto (for clusters) property, which is a highly desirable property for phylogenetic consensus methods. In particular, one can employ strict consensus merger algorithms to boost the scalability and accuracy of supertree methods satisfying Pareto; cf. SuperFine. In this work, we establish a Pareto-like property for phylogenetic networks. Then we consider the recently introduced RF-Net method that heuristically solves the so-called RF-Network problem and which was demonstrated to be an efficient and effective tool for the inference of hybridization and reassortment networks. As our main result, we provide a constructive proof (entailing an explicit refinement algorithm) that the Pareto property applies to the RF-Network problem when the solution space is restricted to the popular class of tree-child networks. This result implies that strict consensus merger strategies, similar to SuperFine, can be directly applied to boost both accuracy and scalability of RF-Net significantly. Finally, we further investigate the optimum solutions to the RF-Network problem; in particular, we describe structural properties of all optimum (tree-child) RF-networks in relation to strict consensus clusters of the input trees

Dagstuhl Research Online Publication Server

Efficient error correction algorithms for gene tree reconciliation based on duplication, duplication and loss, and deep coalescence

Author: Burleigh J Gordon
Chaudhary Ruchi
Eulenstein Oliver
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Gene tree - species tree reconciliation problems infer the patterns and processes of gene evolution within a species tree. Gene tree parsimony approaches seek the evolutionary scenario that implies the fewest gene duplications, duplications and losses, or deep coalescence (incomplete lineage sorting) events needed to reconcile a gene tree and a species tree. While a gene tree parsimony approach can be informative about genome evolution and phylogenetics, error in gene trees can profoundly bias the results. Results We introduce efficient algorithms that rapidly search local Subtree Prune and Regraft (SPR) or Tree Bisection and Reconnection (TBR) neighborhoods of a given gene tree to identify a topology that implies the fewest duplications, duplication and losses, or deep coalescence events. These algorithms improve on the current solutions by a factor of <it>n </it>for searching SPR neighborhoods and <it>n</it>2 for searching TBR neighborhoods, where <it>n </it>is the number of taxa in the given gene tree. They provide a fast error correction protocol for ameliorating the effects of gene tree error by allowing small rearrangements in the topology to improve the reconciliation cost. We also demonstrate a simple protocol to use the gene rearrangement algorithm to improve gene tree parsimony phylogenetic analyses. Conclusions The new gene tree rearrangement algorithms provide a fast method to address gene tree error. They do not make assumptions about the underlying processes of genome evolution, and they are amenable to analyses of large-scale genomic data sets. These algorithms are also easily incorporated into gene tree parsimony phylogenetic analyses, potentially producing more credible estimates of reconciliation cost.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Consensus properties for the deep coalescence problem and their application for scalable tree search

Author: Burleigh J Gordon
Eulenstein Oliver
Lin Harris T
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Springer - Publisher Connector

PubMed Central

Improved Heuristics for Minimum-Flip Supertree Construction

Author: Burleigh J. Gordon
Chen Duhong
Eulenstein Oliver
Fernández-Baca David
Publication venue: Libertas Academica
Publication date: 01/01/2006
Field of study

The utility of the matrix representation with flipping (MRF) supertree method has been limited by the speed of its heuristic algorithms. We describe a new heuristic algorithm for MRF supertree construction that improves upon the speed of the previous heuristic by a factor of n (the number of taxa in the supertree). This new heuristic makes MRF tractable for large-scale supertree analyses and allows the first comparisons of MRF with other supertree methods using large empirical data sets. Analyses of three published supertree data sets with between 267 to 571 taxa indicate that MRF supertrees are equally or more similar to the input trees on average than matrix representation with parsimony (MRP) and modified min-cut supertrees. The results also show that large differences may exist between MRF and MRP supertrees and demonstrate that the MRF supertree method is a practical and potentially more accurate alternative to the nearly ubiquitous MRP supertree method

Directory of Open Access Journals

PubMed Central

iGTP: A software package for large-scale gene tree parsimony analysis

Author: Bansal Mukul S
Chaudhary Ruchi
Eulenstein Oliver
Fernández-Baca David
Wehe André
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle. Results We introduce iGTP, a platform-independent software program that implements state-of-the-art algorithms that greatly speed up species tree inference under the duplication, duplication-loss, and deep coalescence reconciliation costs. iGTP significantly extends and improves the functionality and performance of existing gene tree parsimony software and offers advanced features such as building effective initial trees using stepwise leaf addition and the ability to have unrooted gene trees in the input. Moreover, iGTP provides a user-friendly graphical interface with integrated tree visualization software to facilitate analysis of the results. Conclusions iGTP enables, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication, duplication-loss, and deep coalescence reconciliation costs, all from within a convenient graphical user interface.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

An ILP solution for the gene duplication problem

Author: Burleigh Gordon J
Chang Wen-Chieh
Eulenstein Oliver
Fernández-Baca David F
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The gene duplication (GD) problem seeks a species tree that implies the fewest gene duplication events across a given collection of gene trees. Solving this problem makes it possible to use large gene families with complex histories of duplication and loss to infer phylogenetic trees. However, the GD problem is NP-hard, and therefore, most analyses use heuristics that lack any performance guarantee. Results We describe the first integer linear programming (ILP) formulation to solve instances of the gene duplication problem exactly. With simulations, we demonstrate that the ILP solution can solve problem instances with up to 14 taxa. Furthermore, we apply the new ILP solution to solve the gene duplication problem for the seed plant phylogeny using a 12-taxon, 6, 084-gene data set. The unique, optimal solution, which places Gnetales sister to the conifers, represents a new, large-scale genomic perspective on one of the most puzzling questions in plant systematics. Conclusions Although the GD problem is NP-hard, our novel ILP solution for it can solve instances with data sets consisting of as many as 14 taxa and 1, 000 genes in a few hours. These are the largest instances that have been solved to optimally to date. Thus, this work can provide large-scale genomic perspectives on phylogenetic questions that previously could only be addressed by heuristic estimates.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Robinson-Foulds Supertrees

Author: David Fernández-baca
J Gordon Burleigh
Mukul S Bansal
Oliver Eulenstein
Robinson-foulds Supertrees
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Supertree methods synthesize collections of small phylogenetic trees with incomplete taxon overlap into comprehensive trees, or supertrees, that include all taxa found in the input trees. Supertree methods based on the well established Robinson-Foulds (RF) distance have the potential to build supertrees that retain much information from the input trees. Specifically, the RF supertree problem seeks a binary supertree that minimizes the sum of the RF distances from the supertree to the input trees. Thus, an RF supertree is a supertree that is consistent with the largest number of clusters (or clades) from the input trees. Results We introduce efficient, local search based, hill-climbing heuristics for the intrinsically hard RF supertree problem on rooted trees. These heuristics use novel non-trivial algorithms for the SPR and TBR local search problems which improve on the time complexity of the best known (naïve) solutions by a factor of Θ(<it>n</it>) and Θ(<it>n</it>2) respectively (where <it>n </it>is the number of taxa, or leaves, in the supertree). We use an implementation of our new algorithms to examine the performance of the RF supertree method and compare it to matrix representation with parsimony (MRP) and the triplet supertree method using four supertree data sets. Not only did our RF heuristic provide fast estimates of RF supertrees in all data sets, but the RF supertrees also retained more of the information from the input trees (based on the RF distance) than the other supertree methods. Conclusions Our heuristics for the RF supertree problem, based on our new local search algorithms, make it possible for the first time to estimate large supertrees by directly optimizing the RF distance from rooted input trees to the supertrees. This provides a new and fast method to build accurate supertrees. RF supertrees may also be useful for estimating majority-rule(-) supertrees, which are a generalization of majority-rule consensus trees.</p

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Algorithms: simultaneous error-correction and rooting for gene tree reconciliation and the gene duplication problem

Author: A Wehe
B Behzadi
B Ma
B Mirkin
C Notredame
D Durand
D Graur
DJ Sherman
K Chen
M Goodman
M Sanderson
MA Bender
MR Fellows
MS Bansal
MS Bansal
MW Hahn
O Eulenstein
O Eulenstein
Oliver Eulenstein
P Bonizzoni
P Górecki
P Górecki
P Górecki
Pawel Górecki
R Guigó
RDM Page
RDM Page
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Evolutionary methods are increasingly challenged by the wealth of fast growing resources of genomic sequence information. Evolutionary events, like gene duplication, loss, and deep coalescence, account more then ever for incongruence between gene trees and the actual species tree. Gene tree reconciliation is addressing this fundamental problem by invoking the minimum number of gene duplication and losses that reconcile a rooted gene tree with a rooted species tree. However, the reconciliation process is highly sensitive to topological error or wrong rooting of the gene tree, a condition that is not met by most gene trees in practice. Thus, despite the promises of gene tree reconciliation, its applicability in practice is severely limited. Results We introduce the problem of reconciling unrooted and erroneous gene trees by simultaneously rooting and error-correcting them, and describe an efficient algorithm for this problem. Moreover, we introduce an error-corrected version of the gene duplication problem, a standard application of gene tree reconciliation. We introduce an effective heuristic for our error-corrected version of the gene duplication problem, given that the original version of this problem is NP-hard. Our experimental results suggest that our error-correcting approaches for unrooted input trees can significantly improve on the accuracy of gene tree reconciliation, and the species tree inference under the gene duplication problem. Furthermore, the efficiency of our algorithm for error-correcting reconciliation is capable of handling truly large-scale phylogenetic studies. Conclusions Our presented error-correction approach is a crucial step towards making gene tree reconciliation more robust, and thus to improve on the accuracy of applications that fundamentally rely on gene tree reconciliation, like the inference of gene-duplication supertrees.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central